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Abstract. In this paper, we address the problem of simultaneous classification and estimation of 
hidden parameters in a sensor network with communications constraints. In particular, we consider 
a network of noisy sensors which measure a common scalar unknown parameter. We assume that 
a fraction of the nodes represent faulty sensors, whose measurements are poorly reliable. The goal 
for each node is to simultaneously identify its class (faulty or non-faulty) and estimate the common 
parameter. 

We propose a novel cooperative iterative algorithm which copes with the communication con- 
straints imposed by the network and shows remarkable performance. Our main result is a rigorous 
proof of the convergence of the algorithm and a characterization of the limit behavior. We also 
show that, in the limit when the number of sensors goes to infinity, the common unknown param- 
eter is estimated with arbitrary small error, while the classification error converges to that of the 
optimal centralized maximum likelihood estimator. We also show numerical results that validate 
the theoretical analysis and support their possible generalization. We compare our strategy with 
the Expectation-Maximization algorithm and we discuss trade-offs in terms of robustness, speed of 
convergence and implementation simplicity. 

Key words. Classification, Consensus, Gaussian mixture models, Maximum-likelihood estima- 
tion, Sensor networks, Switching systems. 

1. Introduction. Sensor networks are one of the most important technologies 
introduced in our century. Promoted by the advances in wireless communications 
and by the pervasive diffusion of smart sensors, wireless sensor networks are largely 
used nowadays for a variety of purposes, e.g., environmental and habitat surveillance, 
health and security monitoring, localization, targeting, event detection. 

A sensor network basically consists in the deployment of a large numbers of small 
devices, called sensors, that have the ability to perform measurements and simple 
computations, to store few amounts of data, and to communicate with other devices. 
In this paper, we focus on ad hoc networks, in which communication is local: each 
sensor is connected only with a restricted number of other sensors. This kind of 
cooperation allows to perform elaborate operations in a self-organized way, with no 
centralized supervision or data fusion center, with a substantial energy and economic 
saving on processors and communication links. This allows to construct large sensor 
networks at contained cost. 

A problem that can be addressed through ad hoc sensor networking is the dis- 
tributed estimation: given an unknown physical parameter (e.g., the temperature in 
a room, the position of an object), one aims at estimating it using the sensing ca- 
pabilities of a network. Each sensor performs a (not exact) measurement and shares 
it with the sensors with which it can establish a communication; in turn, it receives 
information and consequently updates its own estimate. If the network is connected, 
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by iterating the sharing procedure, the information propagates and a consensus can 
be reached. Neither centralized coordinator nor data fusion center is present. The 
mathematical model of this problem must envisage the presence of noise in measure- 
ments, which are naturally corrupted by inaccuracies, and possible constraints on the 
network in terms of communication, energy or bandwidth limitations, and of necessity 
of quantization or data compression. 

Distributed estimation in ad hoc sensor networks has been widely studied in the 
literature. For the problem of estimating an unknown common parameter, typical 
approach is to consider distributed versions of classical maximum likelihood (ML) 
or maximum-a-posteriori (MAP) estimators. Decentralization can be obtained, for 
instance, through consensus type protocols (see [1], [2], [3]) adapted to the communi- 
cation graph of the network, or by belief propagation methods [4] and [5]. 

A second important issue is sensors ' classification, which we define as follows [6] . 
Let us imagine that sensors can be divided into different classes according to peculiar 
properties, e.g., measurements' or processing capabilities, and that no sensor knows 
to which class it belongs: by classification, we then intend the labeling procedure 
that each sensor undertakes to determine its affiliation. This task is addressed to a 
variety of clustering purposes, for example, to rebalance the computation load in a 
network where sensors can be distinguished according to their processing power. On 
most occasions, sensors' classification is faced through some distributed estimation, 
the underlying idea being the following: each sensor performs its measurement of a 
parameter, then iteratively modifies it on the basis of information it receives; during 
this iterative procedure the sensor learns something about itself which makes it able 
to estimate its own configuration. 

In this paper, we consider the following model: each sensor i performs a measure- 
ment yi = 9* + ui*r)i, where 9* £ R is the unknown global parameter, u>* > is the 
unknown status of the sensor, and rji is a Gaussian random noise. The more u>* is 
large, the more the sensor i is malfunctioning, that is, the quality if its measurement 
is low. The u>* parameter is supposed to belong to a discrete set, in particular in this 
paper we consider the binary case. 

The goal of each unit i is to estimate the parameter 9* and the specific configu- 
ration uj* . The presence of the common unknown parameter 9* imposes a coupling 
between the different nodes and makes the problem interesting. 

An additive version of the aforementioned model has been studied in [7] , where 
measurement is given by yi = 9* + cj* + rji . Another related problem is the so-called 
calibration problem [8,9]: sensor i performs a noisy linear measurement j/j = AiO + rji 
where the unknown 9 and A; are a vector and a matrix, respectively, while rji is a 
noise; the goal consists in the estimation of 9 and of A;, the latter being known as 
calibration problem. 

All these are particular cases of the problem of the estimation of Gaussian mix- 
tures' parameters [10, 11]. This perspective has been studied for sensor networks 
in [12], [13], [14], and [15] where distributed versions of the Expectation-Maximization 
(EM) algorithm have been proposed. A network is given where each node inde- 
pendently performs the E-step through local observations. In particular, in [14] a 
consensus filter is used to propagate the local information. The tricky point of such 
techniques is the choice of the number of averaging iterations between two consecutive 
M-steps, which must be sufficient to reach consensus. 

The aim of this paper is the development of a distributed, iterative procedure 
which copes with the communication constraints imposed by the network and com- 
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putes an estimation (9, cD) approximating the maximum likelihood optimal solution 
of the proposed problem. The core of our methodology is an Input Driven Consensus 
Algorithm (IA for short), introduced in [16], which takes care of the estimation of 
the parameter 9*. IA is coupled with a classification step where nodes update the 
estimation of their own type to* by a simple threshold estimator based on the current 
estimation of 9* . The fact of using a consensus protocol working on inputs instead, 
as more common, on initial conditions, is a key strategic fact: it serves the purpose of 
using the innovation coming from the units who are modifying the estimation of their 
status, as time passes by. Our main theoretical contribution is a complete analysis of 
the algorithm in terms of convergence and of behaviour with respect to the size of the 
network. With respect to other approaches like distributed EM for which convergence 
results are missing, this makes an important difference. We also present a number of 
numerical simulations showing the remarkable performance of the algorithm which, 
in many situations, outperform classical choices like EM. 

The outline of the paper is the following. In Section 2 we shortly present some 
graph nomenclature needed in the paper. Section 3 is devoted to a formal description 
of the problem and to a discussion of the classical centralized maximum likelihood 
solution. In Section 4, we present the details and the analysis of our IA. Our main 
results are Theorems 4.1 and 4.2: Theorem 4.1 ensures that, under suitable assump- 
tions on the graph, the algorithm converges to a local maximum of the log-likelihood 
function; Theorem 4.2 is a concentration result establishing that when the number 
of nodes N — > +oo, the estimate 9 converges to the true value 9* (a sort of asymp- 
totic consistency). Finally, we also study the behavior of the discrete estimate cD 
by analyzing the performance index the relative classification error over the network 
when N —> +oo (see Corollary 4.4). Section 5 contains a set of numerical simulations 
carried on different graph architectures: complete, circulant, grids, and random geo- 
metric graphs. Comparisons are proposed with respect to the optimal centralized ML 
solution and also with respect to the EM solution. Finally, a long Appendix contains 
all the proofs. 

2. General notation and graph theoretical preliminaries. Throughout 
this paper, we use the following notational convention. We denote vectors with small 
letters, and matrices with capital letters. Given a matrix M, M T denotes its trans- 
pose. Given a vector v, \\v\\ denotes its Euclidean norm. 1a is the indicator function 
of set A. Given a finite set V, R v denotes the space of real vectors with components 
labelled by elements of V. Given two vectors x, z G M. v , dn(x, z) = \{i g V : Xi ^ Zi}\. 
We use the convention that a summation over an empty set of indices is equal to zero, 
while a product over an empty set gives one. 

A symmetric graph is a pair Q = (V, £ ) where V is a set, called the set of vertices, 
and £ C V x V is the set of edges with the property that (i, i) $ £ for all i G V and 
G £ implies G £. Q is strongly connected if, for all i,j € V, there exist 

vertices i\,...i s such that (i, ia), . . . , (i s , j) G £. To any symmetric matrix 

P G R VxV with non-negative elements, we can associate a graph Qp = (V,£p) by 
putting G £p if and only if Py > 0. P is said to be adapted to a graph 

Q if Qp C Q. A matrix with non-negative elements P is said to be stochastic if 
YljeV = 1 f° r ever y i 6 V. Equivalently, denoting by 1 the vector of all 1 in R v , 
P is stochastic if PI — 1. P is said to be primitive if there exists no G N such that 
Pj™° > for every i, j G V. A sufficient condition ensuring primitivity is that Qp is 
strongly connected and Pa > for some i G V. 

3. Bayesian modeling for estimation and classification. 
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3.1. The model. In our model, we consider a network, represented by a sym- 
metric graph Q = (V, £). Q represents the system communication architecture. We 
denote the number of nodes by N = \V\. We assume that each node i G V measures 
the observable 

y i = e*+uj*r) i (3.1) 

where 9* g M is an unknown parameter, r/^'s Gaussian noises IM(0, 1), w*'s Bernoulli 
random variables taking values in {a,/3} (with P(w* = (3) — p). We assume all the 
random variables r^'s and cj*'s to be mutually independent. Notice that each j/jGl 
is a Gaussian mixture distributed according to the probability density function 

/(»<) = (l-p)f(yi\e*,a) +pf( yi \9\(3) (3.2) 

f{ Vl \e\x) = -)=e- { - n ^ t - x £{«,/?}. (3.3) 
x\ Zir 

The binary model of ui* is motivated by different scenarios: as an example, if < a << 
/3, the nodes of type f3 may represent a subset of faulty sensors, whose measurements 
are poorly reliable; the aim may be the detection of faulty sensors in order to switch 
them off or neglect their measurements, or for other clustering purposes. It is also 
realistic to assume that some a-priori information about the quantity of faulty sensors 
is extracted, e.g., from experimental data on the network, and it is conceivable to 
represent such information as an a-priori distribution. This is why we assume a 
Bernoulli distribution on each oj*; on the other hand, we suppose that no a-priori 
information is available on the unknown parameter 9* . However, the addition of an a 
priori probability distribution on 9* does not significantly alter our analysis and our 
results. 

3.2. The maximum likelihood solution. The goal is to estimate the param- 
eter 9* and the specific configuration uj\ of each unit. Disregarding the network 
constraints, a natural solution to our problem would be to consider a joint ML in 9* 
and MAP in the w*'s (see [17, 18]). Let f(y,oo\9) be the joint distribution of y and w 
(density in y and probability in uj) given the parameter 9, and consider the rescaled 
log-likelihood function 

L n (0,lj) := ^ log f(y,u,\9). (3.4) 

The hybrid ML /MAP solution, which for simplicity for now on we will refer to as the 
ML solution, prescribes to choose 9 and lo which maximize Ln{9,uj) 

(gMLQML). = argmax L N (9,U). (3.5) 

Standard calculations lead us to 

(3-6) 

where c is a constant. It can be noted that partial maximizations of Ln(6,uj) with 
respect to just one of the two variables have simple representation. Let 

6{uj) :— argmax Ljv(6>,u;) D(0) :— argmaxLAr(6 | , ui). (3-7) 

8 u 
4 



Then 



9(u) 



a if | tt - 6\ < 8 
(3 otherwise 



(3.8) 



where 



\ 



In 



\ p a J 



The ML solution can then be obtained, for instance, by considering 



9^ = argmaxL(0, £(6>)), 



-ML 



(3.9) 



It should be noted how the computation of the (D )j's becomes totally decentralized 
once ML has been computed. For the computation of 9 ML instead one needs to gather 
information from all units to compute Ln(6,Q(0)) and it is not at all evident how 
this can be done in a decentralized way. Moreover, further difficulties are caused by 
the fact that Ln(6,Q(6)) may contain many local maxima, as shown in Figure 3.1. 

It should be noted that Ln(9,uj(9)) is differentiable except at a finite number of 
points, and between two successive non-differentiable points the function is concave. 
Therefore, the local maxima of the function coincide with its critical points. On the 
other hand, the derivative, where it exists, is given by 
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Stationary points can therefore be represented by the relation 



N /j2 + E< 1 {|»i-e|«5} (j? - js) 



(3.10) 



(3.11) 



A moment of thought shows us that (3.11) is equivalent to the relation 8 = 9(uj(9)). 
This representation will play a key role in the sequel of this paper. 

3.3. Iterative centralized algorithms. The computational complexity of the 
optimization problem (3.5) is practically unfeasible in most situations. However, 
relations (3.8) suggest a simple way to construct an iterative approximation of the 
ML solution (which we will denote IML). The formal pattern is the following: fixed 
DC°> =al. for t = 0, 1, . . . , we consider the dynamical system 



Q(9) 



(*+i) 




/? otherwise 



for any i = 1, . 
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The algorithm stops whenever |6^* +1 ) — \ < e, for some fixed tolerance e > 0. 

A more refined iterative solution is given by the so-called Expectation-Maximization 
(EM) algorithm [19]. The main idea is to introduce a hidden (say, unknown and un- 
observed) random variable in the likelihood; then, at each step, one computes the 
mean of the likelihood function with respect to the hidden variable and finds its max- 
imum. Such a method seeks to find the maximum likelihood solution, which in many 
cases cannot be formulated in a closed form. EM is widely and successfully used in 
many frameworks and in principle it could also be applied to our problem. In our 
context, making the variable co to play the part of the hidden variable, equations for 
EM become (see the tutorial [20] for their derivation) 

Given 6^ e R, for t = 0, 1, . . . , 
1. E-step: for all node i € V, 

(t) I (t) -rM 
q { ' = P ( Q W = a\y 6 {t) ) = - - 

v 1 ' (1-P)f (vfiV = +Pf (yR (t) = 0,OM) 



2. M-step: 



£, 6V gfa-2 + (l-^)/3- 



The algorithm stops whenever |#( t+1 ) — 0w| < e, for some fixed tolerance e > 0. It 
is worth to notice that q- computed in the E-step actually is the expectation of the 
binary random variable 1, _-.(«)_ On the other hand computed in the M-step 

is the maximum of such expectation. 

An important feature of EM is that it is possible to prove the convergence of the 
sequence {0 ( -*- ) }(=n to a local maximum of the expected value of the log-likelihood with 
respect to the unknown data u, a result which is instead not directly available for IML. 
Both algorithms however share the drawback of requiring centralization. Distributed 
versions of the EM have been proposed (see, e.g., [12], [14]) but convergence is not 
guaranteed for them. In Section 5 we will compare both these algorithms against 



the distributed IA we are going to present in the next section. While it is true that 
EM always outperforms IML, algorithm I A outperforms both of them for small size 
algorithms, while shows comparable performance to EM for large networks. 

4. Input driven consensus algorithm. 

4.1. Description of the algorithm. In this section we propose a distributed 
iterative algorithm approximating the centralized ML estimator. The algorithm is 
suggested by the expressions in (3.8) and consists of the iteration of two steps: an 
averaging step where all units aim at computing 9 through a sort of Input Driven 
Consensus Algorithm (IA) followed by an update of the classification estimation per- 
formed autonomously by all units. 

Formally, IA is parametrized by a symmetric stochastic matrix P. adapted to the 
communication graph Q (Pij > if and only if, G £), and by a real sequence 

-y(*) _j. o. Every node i has three messages stored in its memory at time t, denoted 
with vf \ and Ljf\ Given the initial conditions — 0, — and the initial 
estimate uj^ = a, the dynamics consists of the following steps. 

1. Average step: 

Mf +1) = (l-7 (t) )E^ t) +7 (t) ^(-f ) )" 2 ( 41a ) 

j 

-| t+1) = (l-7 W )E^-f +7 W (£f ) )" 2 (4-lb) 

r^/rvA (4.ic) 

2. Classification step: 

Df +1 ) = uS (t+1) ) = { a if lw ~ K 6 (4-2) 

n 1 \ otherwise. v ' 

It should be noted that the algorithm provides a distributed protocol: each node only 
needs to be aware of its neighbours and no further information about the network 
topology is required. 

4.2. Convergence. The following theorem ensures the convergence of IA. The 
proof is rather technical and therefore deferred to Appendix A. 

Theorem 4.1. Let 

(a) 7<*) -> 0, 7^ > l/t, and 7W = 7<* +1 ) + o( 7 (* +1 )) for t -> +00; 

(b) P € M^ xV be a stochastic, symmetric, and primitive matrix with positive 
eigenvalues. 

Then, there exist Q IA £ {a, (3} v and IA £ R such that 
1. 

lim £)<*) =' tu IA , lim el t] a d- 9 IA 

t— >-\-oo t— >+oo 

for all i € V; 
2. they satisfy the relations 

9 IA = 8(Q IA ) , Q IA = Q(6 IA ). 

A number of remarks are in order. 
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• The assumption on the eigenvalues of P is essentially a technical one: in 
simulations it does not seem to have a crucial role, but we need it in our proof 
of convergence. On the other hand, given any symmetric stochastic primitive 
P, we cam consider a 'lazy' version of it P T = (1 — r)I + tP and notice that 
for r € (0, 1) sufficiently small, indeed P T will satisfy the assumption on the 
eigenvalues. 

• The requirement 7W > 1/t is not new in decentralized algorithms (see for 
instance the Robbins- Monro algorithm, introduced in [21]) and serves the 
need of maintaining 'active' the system input for sufficiently long time. Less 
classical is the assumption 7W ~ ^( t + 1 ) which is essentially a request of 
regularity in the decay of 7W to 0. Possible choices of 7W satisfying the 
above conditions are 7'*) = for £ g (0,1), or 7W = t -1 (ln£) Q for any 
a > 0. 

• The proof (see Appendix A) will also give an estimation on the speed of 
convergence: indeed it will be shown that \\8^ — 8 IA \\ — 0(7^) for t — > 00. 

• Relations in item 2. implies that 8 IA is a local maximum of the function 
L N {8,Q(8)) (see (3.11)). 

4.3. Limit behavior. In this section we present results on the behavior of our 
algorithm for N — > +00. All quantities derived so far are indeed function of network 
size N. In order to emphasize the role of N, we will add an index N when dealing 
with quantities like 8* (e.g. 8^ L ). Instead we will not add anything to expressions 
where there are vectors uj involved since their dimension is itself N . 

Figure 3.1 shows a sort of concentration of the local maxima of Ln(0,uj(0)) to 
a global maximum for large N. Considering that IA converges to a local maximum, 
this observation would lead to the conclusion that, for large N, the IA resembles the 
optimal ML solution. This section provides some results which make rigorous these 
considerations. 

Notice first that, applying the uniform law of large numbers [22] to the expression 
(3.6), we obtain that, for any compact K C R, almost surely 



lim max 



L N (8,Q(8))- / J(s,8)f(s)ds 



= (4.3) 



where 



W) = _ + ((i^ (_L _ * ) + log I^)) +c (4 . 4) 

where c is the same constant as in (3.6). The limit function J R J7(s, 9)f(s)ds turns 
out to be differentiable for every value of 9 and to have a unique stationary point for 
9 = 9* which turns out to be the global minimum. Unfortunately, this fact by itself 
does not guarantee that global and local minima will indeed converge to 8*. In our 
derivations the properties of the function J R J (s , 8) f {s)ds will not play any direct 
role and therefore they will not be proven here. The main technical result which will 
be proven in Appendix B is the following: 

Theorem 4.2. Denote by Sn the set of local maxima of L(6,Q(8)). Then, 

lim max |£- 0*|=O (4.5) 



almost surely and in mean square sense. 
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This has an immediate consequence, 
Corollary 4.3. 



lim el A = lim 6% L = 

Ar->.+oo N-- ' 



(4.6) 



almost surely and in mean square sense. 

Regarding the classification error, we have instead the following result: 
Proposition 4.4. 



lim — Ed H (Q IA ,uj* 



1 



= lim — Ed H (Q wlLj ,u}*) 

iV->+oo N ' 



(4.7) 



where 



q(p, a, p) = (1 — p)erfc 



V2 



1 — erfc 



( S 



\0y/2 



and erfc(a;) := Ix°° e ' ^ * s ^ e com pl em sntary error function. 

These results ensure that the IA performs, in the limit of large number of units 
N, as the centralized optimal ML estimator. Moreover, they also show, consistency in 
the estimation of the parameter 0* . As expected, for N — > +oo the classification error 
does not go to since the increase of measurements is exactly matched by the same 
increase of variables to be estimated. Consistency however is obtained when p goes to 
zero since we have that lim p _>o Q(p, ct, /3) = 0. Moreover, notice that the dependence 
of function q on the parameters a and /3 is exclusively through their ratio /3/a. In 
particular, we have 



lim q(p, a,fi) =0 

/3 / a— v+oo 



lim q(p, a, /3) = 1. 



5. Simulations. In this section, we propose some numerical simulations. We 
test our algorithm for different graph architectures and dimensions, and we compare 
it with the IML and EM algorithms. Our goal is to give evidence of the theoretical 
results' validity and also to evaluate cases that are not included in our analysis: the 
good numerical outcomes we obtain suggest that convergence should hold in broader 
frameworks. The numerical setting for our simulations is now presented. 



Model: the sensors perform measurements according to the model (3.1) with 
Q* = 0, a = 0.3, /3 = 10; the prior probability P(uj* = /3) is equal to p = 0.25. 

Communication architectures: given a strongly connected symmetric graph 
Q = (V,£), we use the so-called Metropolis random walk construction for P (see [23]) 
which amounts to the following: if i ^ j 

p f if 

13 \ (maxIdeg^ + Meg^ + l})- 1 i£(i,j)&£ 

where deg(i) denotes the degree (the number of neighbors) of unit i in the graph Q . 
P constructed in this way is automatically irreducible and aperiodic. 
We consider the following topologies: 
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1. Complete graph: Py = A for every i, j = 1, . . . , N; it actually corresponds to 
the centralized case. 

2. Ring: N agents are disposed on a circle, and each agent communicates with 
its first neighbor on each side (left and right). The corresponding circulant 
symmetric matrix P is given by Py = | for every i = 2, . . . , N — 1 and 
j € {i - + 1}; Pu = P12 = Pin = §; Pvi = Pnn-i = Pnn = §; 
Pjj = elesewhere. 

3. Torus-grid graph: sensors are deployed on a two dimensional grid and are 
each connected with their four neighbors; the last node of each row of the 
grid is connected with the first node of the same row, and analogously on 
columns, so that a torus is obtained. The so-obtained graph is regular. 

4. Random Geometric Graph with radius r = 0.3: sensors are deployed in the 
square [0, 1] x [0, 1], their positions being randomly generated with a uniform 
distribution; links are switched on between two sensors whenever the distance 
is less than r. We only envisage connected realizations. 

From Theorem 4.1, P is required to possess positive eigenvalues: our intuition 
is that these hypotheses, that are useful to prove the convergence of the IA, are not 
really necessary. We test this conjecture on the ring graph, whose eigenvalues are 
known [24] to be A m = | (l + 2 cos (^p)) , m — 0, . . . , N - 1 and which are not 
necessarily positive. 

Algorithms: We implement and compare the following algorithms: IA with 
jW = l/t^ for different choices of £ € {0.5,0.7,0.9}, and the two centralized iterative 
algorithms IML, and EM described in Section 3.3. 

Outcomes: we show the performance of the aforementioned algorithms in terms 
of classification error and of mean square error on the global parameter, in function 
of the number of sensors N. All the outcomes are obtained averaging over 400 Monte 
Carlo runs. 

We observe that the classification error (Figure 5.1) converges for N — > oo for 
all the considered algorithms. On the other hand, when N is small, IA performs 
better than IML and EM, no matter which graph topology has been chosen: this 
suggests that decentralization is then not a drawback for IA. Moreover, for smaller 
7^ (i.e., slowing down the procedure), we obtain better IA performance in terms 
of classification. Nevertheless, this is not universally true: in other simulations, in 
fact, we have noticed that if 7W is too small, the performance are worse. This is 
not surprising, since 7'*' determines the weights assigned to the consensus and input 
driven parts, whose contributions must be somehow balanced in order to obtain the 
best solution. An important point that we will study in future is the optimization 
of 7W, whose choice may in turn depend on the graph topology and on the weights 
assigned in the matrix P. 

Analogous considerations can be done for the mean square error on 6: when ./V 
increases, the mean square error decays to zero. 

We remark that convergence is numerically shown also for the ring topology, which 
is not envisaged by our theoretical analysis. Hence, our guess is that convergence 
should be proved even under weaker hypotheses on matrix P. 

For the interested reader, a graphical user interface of our algorithm is available 
and downloadable on http://calvino.polito.it/~fosson/software.html. 
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(a) Complete graph. 



(b) Ring 
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(c) Grid. (d) Random geometric graph. 

Figure 5.1. Relative classification error 



6. Concluding remarks. In this paper, we have presented a fully distributed 
algorithm for the simultaneous estimation and classification in a sensor network, given 
from noisy measurements. The algorithm only requires the local cooperation among 
units in the network. Numerical simulations show remarkable performance. The 
main contribution includes the convergence of the algorithm to a local maximum of 
the centralized ML estimator. The performance of the algorithm has been also studied 
when the network size is large, proving that the solution of the proposed algorithm 
concentrates around the classical ML solution. 

Different variants are possible, for example the generalization to multiple classes 
with unknown prior probabilities should be inferred. The choice of sequence {7^}*eN 
is critical, since it influences both convergence time and final accuracy; the determi- 
nation of a protocol for the adaptive search of sequence {7®}teN is left for a future 
work. 
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Appendix A. Proof of Theorem 4.1. Consider the discrete-time dynamical 
system defined by the update equations (4.1) and (4.2): the proof of its convergence 
is obtained through intermediate steps. 

1. First, we show that, for sufficiently large t, vectors ^(*),i/W, and 0w are 
close to consensus vectors and we prove their convergence, assuming has 

12 



already stabilized. 

2. Second, we prove the stabilization of in finite time, by modelling the 
system in (4.1) and (4.2) as a switching dynamical system. 

3. Finally, combining these facts together we conclude the proof. 

A.l. Towards consensus. We start with some notation: let £1 := I — iV _1 ll T ; 
given x G R v , let x := N~ 1 t J x so that x — x\ + tlx. 

Given a bounded sequence w$ € R N . consider the dynamics 

,(w) = (l_ 7 ('))p,(() +7 (Vt) te n (A.l) 

where x^ is any fixed vector, and where, we recall the standing assumptions, 

(a) 7 M £ (0, 1), 7W > 1/t, 7W \ and 7W = 7C+ 1 ) + o( 7 ( t+1 )) for t -> +00; 

(b) P € R^ xV is a stochastic, symmetric, primitive matrix with positive eigen- 
values. 

A useful fact consequence of the assumptions on 7'*', is the following: 

11(1 - 7 < s >) < e <J< tol (t) (A.2) 

s=t 

for any choice of t > to > 0. 

On the other hand, as a consequence of the assumptions of P (see [25]) we have 
that P* — > iV _1 ll T , or equivalently that P*f2 — > for £ — > +00. More precisely, we 
can order the eigenvalues of P as 1 = /ii > /Lt2 > • • • > A*iv > 0. and we have that 

ll^ll</4- 

Lemma A.l. It 

tlx® = 0(7 (t) ) , for t ->• +00. 



Proof. From (A.l) and the fact that J7P = PJ7 we get, for any fixed to and f > to, 
Qx^+V = JJ (l - 7 (s) ) PW^ + Y, II ( X - 7 W ) 7 W ^~ S ^W. (A.3) 

s— to s—to k—s-\-l 

This yields 

||fiz< 1+1) || a < f[ (l-7 (s) )ll^ (t0) ll2 + E II (l-7 (fe) )7 (s) l^|^||^|| 2 

s=t s=t k=s+l 

t t t 

< [] (l-l {s) )\\nx^\\ 2 + Kj2 LI ( 1 -7 (fc) )7 (s V 2 |^ s (A.4) 

s=t s=to fc=s+l 

with := max s Hu^l^. 

Fix now < e < 1 — |/X2 J and let to € N be such that 7 ^ (t) - 6 (1 — e, 1) for 

all t > to- Hence, for f > s > to, we have that 7W < ^.Z^t-a . Consider now the 
estimation (A.4) with this choice of to. We get 
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||^ (t+1) || 2 < II (l-7 (s) J llOx^lb +tf7 (t) £^ 

S — Iq S 



_ Ji«i' 

s=t ± 1-e 

Using now (A. 2) the proof is completed. □ 

Proposition A. 2. // 3 t e N s.t. = uVt>t Q then 

lim = ul. 

t— >+oo 

Proof. Write x^ — x^l + flx^ and notice that from Lemma A.l it is sufficient 
to prove that lim f ^ +00 x^' 1 = Si. From (A.l) and the fact that 1 T P = 1 T , we obtain 



J](l- 7 (s) )(x (s) -u) 

s=t 



which goes to zero from the non-summability of 7W. □ 

We now apply these results to the analysis of 0w. We start with a representation 
result. 

Lemma A. 3. It holds, for t -> +00, 

5h, =£ i+ X"" , -S"" , ) + ° (-'"*)■ (") 

Proof. For any i 6 V, 



„(*) pM 


(*) 
"i 


p(0 


(0 
p(0 


(0 
p(0 




CO 
_ Mi 

V 


-fi (t) 
(0 




"To _ 



P«] 

1 (0 

It follows from Lemma A.l that ^ = p^l + 0(7 (t) ) and i/W = p(*)l + O( 7 (0) for 
t — > +00. This and the fact that pW is bounded away from (indeed P^O > a -2 for 
all t > 0), yields 



from which thesis follows. □ 

We can now present our first convergence result. 
Corollary A. 4. It holds, for t — s- +00, 

f :) = ^ + o{^), ngw=o(7<*>). 
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Proof. Both relations are obtained from (A. 5). The first one is immediate. The 
second one follows from Lemma A.l and the fact that i/W stays bounded away from 
0. □ 

Corollary A. 4 says that the estimate 9^ is close to a consensus for sufficiently 
large t. Something more precise can be stated if we know that if stabilizes at 
finite time as explained in the next result. 

Corollary A. 5. J/3t eN s.t. u>M = Q IA Vi > t then 

Mm p) = 9@*A) = ^wJ!*M?X s - 1 . 

Proof. Proposition A. 2 guarantees that [iS** and converge to J2iev Vi&i A ]~ 2 ^ 
and jj SiGVpf" 4 ] -2 -"-' respectively. This yields the thesis. □ 

A. 2. Stabilization of D'^. We are going to prove that vector cDW almost surely 
stabilizes in finite time: this, by virtue of previous considerations will complete our 
proof. To prove this fact will take lots of effort and will be achieved through several 
intermediate steps. 

We start observing that, since can only assume values in a finite set, equations 
in (4.1) and (4.2) can be conveniently modeled by a switching system as shown below. 

For reasons which will be clear below, in this subsection we will replace the 
configuration space {a, (3} v with the augmented state space {a, (3+, (3— } v . If u e 
{a,^+,/3-} v , define 

= {x E R v :\xi -yi\ < 5, if Wj = a, xt > jji + 6, if Wj = /3+, x t < yt - S, if uii = /}-}. 
We clearly have R v = LLe{a,/3+,/3-P' ©o>- 

On each W the dynamical system is linear. Indeed, define the maps f u : RxR v — > 
R v and g u : R x R v -> R v by 

[/.(i,x)] J = (l- 7 (t) )[^ (t) ] l +7 (t) ^ 

[g^t 1 x)] t = {l- 1 W)[Px { -%+ 1 ^\ 

where, conventionally, uif = (3 2 if uii = ()+,$—. Then, if # (t) e 9 W , (4.1a), (4.1b), and 
(4.1c) can be written as 



2t*+l) _ „(*+!) /,,(*+!) 
u i — Pi I u i 

Notice that this is a closed-loop switching system, since the switching policy is deter- 
mined by 0W. It is clear that the stabilization of is equivalent to the fact that 
there exist an ui € {a, (3+, (3—} N and a time t such that g W for all f > i. 
From Corollary A. 5 candidate limit points for 0w are 

6{ui)l = 1 - G 
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Also, from Proposition A. 4, the dynamics can be conveniently analysed by study- 
ing it in a neighborhood of the line A = {A1|A E 1R}. 

We now make an assumption which holds almost everywhere with respect to the 
choice of y^'s and, consequently, does not entail any loss of generality in our proof. 

ASSUMPTION: 

• Vi ~ Vj & {0, ±6, ±26} for all i ^ j; 

• 0(w) -yi& {±6} for all oj E {a, f3+, (3-} v and for all i. 

This assumption has a number of consequences which will be used later on: 

(CI) 0{u)l,yil E LU{« p+ /3-}v mt(e w ) for all w € {a,/3+,/3-} v and for all 
i € Vj 

(C2) A fl 8 W n W ' n 8^" = for any triple of distinguished uj,uj',uj". In other 
terms, A always crosses boundaries among regions W at internal point of 
faces. 

We now introduce some further notation, which will be useful in the rest of the 
paper. 

e £ {2: e R v : ||Slx|| 2 < e}, 9* := 8 £ n 6 W 
r := {uj E {a, 0+, (3-} v : u nA^ 0}. 

For any u E T consider 

n w = {tt = 9^ n 6^ : d H (w, w') = 1, tt n A = 0} 

and define cr u := min we n„ d(O w n A, 7r) > 1 . 

In the sequel, we will use the natural ordering on A: given the sets X, Y C A, 
X < Y means that x < y for all x € X and y E Y. 

Definition A. 6. Given two elements uj, uj' G T, we say £/iai uj' is the future- 
follower of uj (or also that u) is the past-follower of uj' ) if the following happens: 

(A) There exists io such that uJi = lo[ for all i ^ iq and uJi ^ uj' ig ; 

(B) e w n A < e w , n A. 

Notice that, in order for u and to' to satisfy definition above, it must necessarily 
happen that either uji — a and oj' io — f3+, or uJi a = ft— and u>' io = a. Given uj E T. 
its future- follower (if it exists) will be denoted by u> + . It is clear that (because of 
property (C2) described above) that we can order elements in T as u , u> 2 , . . . ,uj m in 
such a way that uj r+1 = (us r ) + for every r = 1, . . . , M — 1. 

Given u> ET, consider the following subsets of (see Fig. A.l): 

Ml := {x E e e u : xl + VLz E Q e u , Vz : ||z|| a < e} 

£l, u + :H«e e :^nA<x<^ + nA}. 

(with the implicit assumption that £ e + = if uj + does not exist.) We clearly have 

Notice that, because of property (CI), we can always choose eo E (0, min^gr &J) 
such that 

e (J Ml", Vw e r,Vi e v. 



1 d(©uj H A, 7r) denotes the distance between the two sets 6^ n A and the set it 
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This implies that there exists c > such that 

d(\Jd A (Ml,nA),{6(u), yi }]>c, Ve<e (A.6) 
Xuj'er J 

where <9a(') denotes the boundary of a set in the relative topology of A. 

Fix now e < eo and choose t e such that 9^' € C for all t > t e (it exists by 
Corollary A. 4). From now on we consider times t > t e . Our aim is to prove through 
intermediate steps the following facts 

Fl) if 0(uj) G AA^ then Ai^ is an asymptotically invariant set for namely, 
when t is sufficiently large, if 6^ G then i - t+1) e M^; 

F2) if 6(uj) <£ Ml then $*) ^ A-f* for * sufficiently large; 

F3) fl® $ U u6{ a,fl + ,fl- } v^, w + for * sufficiently large. 

Fl) Asymptotic invariance of .M£, when 0(u;)l € AA.%. 

Lemma A. 7. 7/0 (t) e 9 W tfien f/iere exists € [a 2 /f3 2 , /3 2 /a 2 } anrfr (t) = o(7 (t) ) 
/or £ — > +oo smc/i that 

^(t+i) ^t) ,^ ^ /_ ^(t) 



+ c (t) 7 W few) - 6 {t) \ + rW (A.7) 



Proof. If 0W € Q u then 



(1 _ 7 W )p (0 + 7 (t )iV -i ^ gg) 

(l- 7 W)7i( t )+ 7 (*)A-iEL w r 2 * (t) 
F (t) 7 ( t ) JV - 1 ^ i ww _ 2 _ - (t ) 7 ( t) ^-i E iv i 



I7(t+l)l7(t) 
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Choosing = — — j^=j £ [a 2 / fi 2 , j3 2 / a 2 ] and using Corollary A. 4 thesis 
easily follows. □ 

PROPOSITION A. 8 (Proof of Fl)). There exists t' > t e such that, if 9{uj)t £ U , 
then 

6® e Mf„ =>> 9 {t+1) e Ml Vi > t' . 



Proof. Consider the relation (A. 7). If 9^ £ Aifj and if t is large enough so that 
c Wry(t) < l ; we have, by convexity, that 

z := + c (t) 7 (t) (o(lj) - t^) £ Ml. 



Moreover, because of (A. 6) and the fact that c-*' is bounded away from 0, there exists 
c' > such that d(z, 9(A4^nA)) > Proof is then completed by selecting t' > t e 

such that c (t) 7 (t) < 1 and \r(t)'\ < c'7 ( 'V 2 for all t > t'. □ 

F2) Transitivity of Ai^ when 9(u>)l ^ M.%. Our next goal is to prove that 
if 9((jj)1 ^ Ai^, then, at a certain time t, 9^ will definitively be outside Ai^. A 
technical lemma based on convexity arguments is required. 

Lemma A. 9. Let uj £ T be such that there exists its future-follower u + . Then, 

0(w)i>e u nA => ff(w+)i>e u nA 
e(u+)i < e w+ n A 0(u))i < e u+ n A. 

Proof. Suppose u>i = ujj,Vi 7^ io and uii = j3— , cj^~ = a (the other case can 
be treated in an analogous way). Pick x' £ 9 W n A and x" £ <d u + n A. From 
\x" — yi \ < 6, and \x' — j/j | > 5 it immediately follows that x" > 2/j — (5, a:' < j/i — 8 
and, in particular, the fact 

&l>^nA. (A.8) 

Notice now that 



iGV\i 



2/i (a 2 /3 2 



/3 2 



,+ 2 



£ 77+^ E 7T+ 7 



In Figures A. 2 and A. 3 a picture of the various points is depicted when 0(u>) > 9 w n A. 
A convexity argument and the use of (A.8) now allow to conclude. □ 
PROPOSITION A. 10 (Proof of F2)). £ Q u , then there exists t" such that 

i 01 vt > t". 
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Figure A. 2. u iQ = /}, 9{uj)1 > U n A 




Figure A. 3. 6(u)l > 6„ n A 



Proof. Suppose 8(uj)1 > 0^ PI A (the case when is < can be treated analogously). 
Lemma A. 9 implies that 9(lo + )\ > 9 W n A. Let c be the constant given in (A. 6) and 
put 

A := {x e 01 U 6^+ \x<a:= min{0(w), 9{lo+)} - 5/2}. 

Consider the relation (A. 7) and choose t\ in such a way that 

^(t+i) ^(t) ... 

9 -6 < c 2 (max{y,} - min{^})7 W + r(t) < 2/2 (A.9) 

and \r(t)\ < a 2 c^^ /4/3 2 for all t > t\. It also follows from (A. 7) that, if for some 
t > ti 9 (t) £ A, then, 

-(t+i) -(t) 

9 >9 +a 2 c~7 W /4/3 2 . (A.10) 

Owing to the non-summability of 7W it follows that if 0™ enters in 0£, for some 
t > ti, then, in finite time it will enter into A \ and then it will finally exit 

-(*2) 

A. In particular there must exist ti > t\ such that 9 > a. We now prove that 

-(*2) 

9 1 > Qu; for every t > t 2 . If not there must exist a first time index £3 > t% such 

-(« 3 ) -fe-i) 
that < a — c. Because of (A.9), it must be that 9 < a — c/2 but this 

contradicts the fact that on A, 9 is increasing (A. 10). □ 

F3) Transitivity of (J w bJ + e ^ a p+p-}^ We start with the following tech- 

nical result concerning the general system (A.l). 

Lemma A.ll. Let scW be the sequence defined in (A.l) and suppose that there 
exists a strictly increasing sequence of switching times {rfc}£f^ such that 

— u^' Vi 7^ io and Vs € [to, +oo[ 

(s) = U>' VS E V := {jt= [T2k, T 2 k+l) 

*> \v» VseI":=\Jir [T 2 k + uT 2k+2 ). 
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Then, for every 5 > 0, there exists t$ and two sequences a ^ > and b { 5 t] < <y 7 (*), 
such that 

(n (> +1 > - x ^)) = 4V> («' - «") + bf 

for t G J' t > t$. 

Proof. Let </>j £ K v be an orthonormal basis of eigenvectors for P relative to the 
eigenvalues 1 = Ai > Aa > • • • > Xn > 0. Also assume we have chosen (f>i — N" 1 ^ 2 !. 

We put 

pW ._ nLo(i-7 (fc) ) 



and we notice that 



^r = ( 1 -^ +1) )^TT)^ 1 ' fOT ^+°°- 



7 

Fix e in such a way that Aa(l + e) < 1 and choose so such that 



p(a) 



< 1 + e , Vs > so- 



Let to > So to be fixed later. From (A. 3) we can write 



n(i- 7 (s) ) [(i-7 w )^-/ 



s=t 

s=t k=s+l 

n C 1 ^ 

s=t 



W\ ~W P*- 



s=i fe=s+l 



0) 



P*- to fia; (to) 



*-! P (t) *-i p(t-i) 

V- E p—^^^^E^ 1 

s=io — 1 to 



„(*) 



n 1 --r 



1 - 7®) P - I PW o) 



s=to 

(7 w _ 7 ( t -D) £ p t - s -i£__ nu ( S ) +7 (t)pt-toL__ Qu (to, 

s=t 



p(*o) 



s=to x 



t-1 

(t) pt-a- 



P(a) 



(A.ll) 
(A.12) 
(A.13) 
(A.14) 
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It follows from the assumptions on P, the assumptions on 7W and relation (A. 2) 
that the terms (A. 11) and (A. 12) are both 0(7™) for t —> +00. We now estimate 
(A.13): 



E p 



a=t 
t-1 

E 

s=t 

t—1 



^[A 2 (l + e )]*— 1 



/ j?0) 



s=to 



\_ F( s+1 ) 



IIP*— ^u^ 1 ) 



A' < 



< 



K 



l-A 2 (l + e) 



to 



(A.15) 



where 



A' = max 1 1 it W I 



Pt •■= sup 

t>s>t 



p(t-i) p(s+i) 



We now concentrate on the component iq of the term (A. 14). Using the spectral 
decomposition of P and the assumptions on we can write, 



E pt 

s=*o 



E( 



A 



j>2 h:t <Th<t-l 



E 



(-!)>' -t/'). 



F(r h - 1) 

If t € J', the above expression can be rewritten as 

.t-r^ F(t-1) 



(A.16) 
(A.17) 



E(^)l E 

j>2 fc:t <T 2fc <t-l 



' t _ T2fc F(f-l) 
j A(r 2fe - 1) 



A 



Notice that 



! F(r 2k -1) X i 



(v — v ). 



Ffo k -1 - 1) 



F(r 2k - 1) 



1- A 



r 2 k-T2k-l F (T 2k - 1) 



F{T 2k ^ - 1) 



(we have used the fact that < Aj(l + e) < 1 for all j > 2). To complete the proof 
now proceed as follows. For a fixed 6 > 0, choose to > sq in such a way that (A.15) is 
below 6/2. Then, fix tg > to in such a way that the summation of (A. 11) and (A. 12) 
is below (J7W 1 2 for t > tg. It is now sufficient to define 



,(*) - 



E( 



h)l E 

k:t <T 2k <t-l 



tt _ T2fc _ x F(t-1) 



F(r 2k - 1) 



A 



F(t-l) 



F{T 2k -i - 1) 



and bg^ equal to the sum of the terms (A. 11), (A. 12), and (A.13), □ 
PROPOSITION A. 12 (Proof of F3)). There exists t'" £ N such that 

u,uj'e{a, 0} N 
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(A.18) 



> 



for allt > t 1 ". 

Proof. In view of the results in Propositions A. 8 and A. 10, and the fact that 
0(*+i) _ gW goes to for t -> +oo, if (A.18) negation of (A.18) yields that there 
exists uj e T such that 9^ € for t large enough. Now, if 9^ € C UiU1 + (1 6 W (or 

if 0^) € H W +) for i sufficiently large, a straightforward application of (A. 7) 

would imply that 9^ would necessarily exit C u u + in finite time. Therefore, it must 

hold that 0w keeps switching, for large t, between n 9 W and £ WiClJ + n W +. 

From Lemma A. 3 and Corollary A. 4 we can write 

#(*+!) _ QWl = 

Define now 

I':={t\P) g9u }, J":={t|0« £6 U+ } 

and put t/ = l/w 2 and w" = 1/w^" 2 . From Lemma A. 7, and applying Lemma A. 11 
to and i/M, we get that for t € I' sufficiently large, it holds 

^+i)_^) = c (*) 7 ( % ^ (A.19) 

If 6>(w) > 8 W n A, then also, by Lemma A. 9, y w + > 8 w nA. This, using (A. 7), would 
imply that 9^ would necessarily exit w + in finite time. Therefore, we must have 

At) 

9{ui) < W n A. Hence, y u — 9 < 0. Moreover, it is easy to check that in any case 
At) 

(v' — v ) (yi — 9 ) < 0. Recall now the definition of the constant c in (A. 6) and 
notice that, since 9^ € C u u + , 

c (t) 7 (t) (yc-0 (t) )<-a 2 S/4/? 2 7 (t) - 

Choose now S such that <5 < a 2 c/16/3 2 and t > ig such that r(t) < 5^'. It then 
follows from (A.19) that for t e I' and t > i , it holds 

gg+l) _ g(t) < _ a 2 c78/3 2 7 (t) < 

This says that as long as f?W S W , its io-th component decreases. But this entails 
that 9^ can never leave 0^, which contradicts the infinite switching assumption and 
thus implies the thesis. □ 

A. 3. Proof of Theorem 4.1. Propositions A. 8, A. 10, and A. 12 imply that 
there exists w IA € {a, (3} v such that 9^ G Qqia for t sufficiently large. This im- 
mediately implies that Qw = uj ia for t sufficiently large. Corollary A. 5 implies 
that 9 IA = lim t ^ +00 = 9{uj ia ) Finally, since 9(uj IA ) € Qqia, we also have that 
u IA = lo{9 ia ). 

Appendix B. Proof of concentration results. 
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it) 
?(i) 



n 



At) 



+ o( 7 (t) ) 



B.l. Preliminaries. For a more efficient parametrization of the stationary points, 
we introduce the notation: 

coe{a,(3} v O u :={xeR\\x-yi\<5 L>i = a} (B.f) 

ft is then straightforward to check from (3.11) that the set of local maxima Sn can 
be represented as 

S N := {9 = 9{lo) I lo G {a,/3} v , 9{uj) e (B.2) 

Since, 

9 W 7^ & lo = u{x) for some x G K (B.3) 

for analysing the set Sn we can restrict to consider w of type cj = u5(x). Consider the 
sequence of random functions ~fi\r(x) := 9(u)(x)). 

From (3.8), applying the strong law of large numbers, we immediately get that 

hm lN {x) = 7oo (a:) := 2 . (B.4) 

N^y+oo E,{U1(X) 1 ) 

Something stronger can indeed be said by a standard use of Chernoff bound [26] : 
Lemma B.l. For every e > 0, there exists q < 1 such that, for any x G K, 



'(l7Jv(aO-7oo(aO| > e ) < 2 1 



N 



Proof. Let ai = yiLON(x)^ 2 and bi — uin(x)~ 2 with i G {1, . . . , N} and let a and 
b denote the corresponding expected values. 

By Chernoff's bound and by Hoeffding's inequality we have, respectively, that 



> e 2 < 2q. 



with 



(12 



(B.5) 



Fix ei < and e 2 < 



then 



' (|W(x) -yoo0*0| > e) 



< 



l v^ iv 



S <7l +92 + - L {/3 4 (eib+|a|e 2 )>£} 



> e 



where the last step follows by the way e% and e 2 have been chosen. 

There is still a point to be understood: in our derivation qi and g 2 depend on the 
choice of x through a and b. However, it is immediate to check that a and b are both 
bounded in x. This allows to conclude. □ 
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From (B.4) is immediate to see that 7^ is a bounded function of class C 1 and it 
has an important property which will be useful later on. 
Lemma B.2. There exists a constant C > such that 

x--y 00 {x)>C{x-6*) if x <= {9*, +00) 
Too (x) -x > C{9* -x) if xe (-oo,d*) 

7oo(0*) = 0* 



Proof. 

If a; € (6* , +00) and / is the density of each yi (a mixture of two Gaussians) then 

, > _ g jg (* ~ ggg + g Jg\(*-W) - *)/(*)<** 
a; 2/00 (£) — i 5 



> 



j,f R (x-t)f(t)dt 



where the last inequality follows from the fact that (a; — t)f(t)dt > 0. We 

conclude that 



x - Voo(x) > - +s — 77-7- > 0. 



Second statement if x € (—00, 9*) can be verified in a completely analogous way. The 
third statement then simply follows by continuity. □ 
We now come to a key result. 

Lemma B.3. For any fixed e > ; there exist q € (0, 1) and x > such that 

V(7N(x)eQ Q ( x) )<xq N (B.6) 

for all x such that \x — 8*\ > e. 

Proof. We assume x > 6* + e (the other case x < 6* — e being completely 
equivalent). Fix e' € (0, Ce) where C was defined in Lemma B.2 and estimate as 
follows 



(B.7) 



P ( lN (x) e e Q[x) ) < P (ty(s) e e s(x) , ftjvfc) - 7oo(a:)| < 
+ ¥(\ lN (x)- loo (x)\>e'). 

Using Lemma B.2 we get 

{\j N (x) - 7oo(a)| < e'} C {7^(2;) <x-(Ce- e')} . 

Thus 

{jn(x) G Qq( x ), \^n(x) - Joo{x)\ < e'} 

C {$i : Vi e (7jv(x) - 5,7jv(ac) - 5 + min{Ce - e', 5})} 

and, consequently, the first term in (B.7) can be estimated as 

(/•7iv(x) — 5+min{Ce — e',5} x 
i - / f(y)dy ] 

(B. 
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where f(y) is the density of each y.j. Considering now that f(y) is bounded away from 
on any bounded interval, that |7at(x) — "foo(x)\ < e ' an d that 700(2;) is a bounded 
function, we deduce that the right hand side of (B.8) can be uniformly bounded as 
q N for some q G (0, 1). Substituting in (B.7), and using Lemma B.l we finally obtain 
the thesis. □ 

B.2. Proof of Theorem 4.2. Define 

A N (e) := {Boj G {a, (3} v : 9(u) G Q u , \6(w) - 9*\ > e} 
for any e > and 

Bi :={3i€V:\yi-e*\>N} 

B 2 := {3(i,j)e Vx V: \ Vi - Vj \ < AT 4 } 

B 3 := {3(i,i) G V x V : \ Vi - Vj \ G (2S, 2<5 + N~ 4 } 

and estimate P (A N (e)) < P (^iv(e),Bf n^fl Bjj) +P(B X ) +P(S 2 ) +P(S 3 ). Standard 
considerations allow to upper bound the probability of each event Bi by a common 
term K/N 2 . We now focus on the estimation of the first term. The crucial point is 
that, the condition B\ n B% n £>§ allow us to reinforce condition (B.3) in the sense 
that all lo for which U ^ can be obtained as uj — uj(x) as x varies in a set whose 
cardinality is polynomial in N . Specifically, define 

Z = {£j = 6*-N- 6 + jN- 4 : j G N, j < j max } 

where j max ■— \N i (2N + 2(5)] and notice that, assuming that the y^s satisfy B% (1 S3, 
we have that and w((j+i) differ in at most one component and that u)(x) G 

{u>(Q), for every x € [CnO+i]- Moreover, because of we have that u)(x)i — 

2(Co)j = /3 for all x < 9* N — S and for all i. Similarly, w(x)i = ^(Cj ma x)i = P f° r a ^ 
x > 9* + N + 5 and for sll i. In other terms, under the assumption that the y^s satisfy 
BI n BI n B§, it holds {uj G {a, /3} v | 6 W 7^ 0} = {Q(x) \ x G Z}. Hence, 

P(^ A r( e ),65 ; n^n^) < 

|J {7iv(C)ee 2(c) ,|7iv(C)-^l>e} 



< 



<P|^U {7iv(C)eG s(C) ,|7oo(C)-^l>e/2}J + 

+ P(|7iv(C)-7oc(C)l<e/2). 

Notice that, because of the continuity of 7^, there exists e > such that |7oo(C)~~$*l > 
e/2 =>■ |£ — 0| > e. We can then use Lemma B.3, 

P ( (J {7iv(0 € 9 2(c) , |7oo(C) - 0*1 > e/2} 

< \Z\V ( 7JV (C) G eo (c) , Itat(C) - 0*1 > e) < ciV 5 ^ 

where c and q are those coming from Lemma B.3 relatively to e. Putting together all 
the estimations we have obtained and using Lemma B.l, we finally obtain that there 
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exists x > such that P (Ajy(e)) < x/N 2 . Using Borel-Cantelli Lemma and standard 
arguments, it follows now that the relation (4.5) hold in an almost surely sense. 

It remains to be shown convergence in mean square sense. For this we need to go 
back to the form (3.10) of the derivative of L(6,Q(6)). The key observation is that 
the second additive term in the right hand side of (3.10) can be bounded uniformly 
in modulus by some constant C . If we denote — N^ 1 yi, this implies that the 
function is increasing for 9 > jn + f3 2 C and decreasing for 9 < — (3 2 C. Hence, 
necessarily, 

\t-l N \ <f3 2 C VZES N . (B.9) 

On the other hand, by the law of large numbers, 7/v almost surely converges to 9* 
and this implies, by the previous part of the theorem that max |£ — 7jv| converges to 

0. This, together with (B.9), yields E max |£ - ^ N \ 2 -> for N -> +oo. Since by the 

ergodic theorem also E|7jv — 9*\ 2 — > for N —> +oo, the proof is complete. 

B.3. Proof of Proposition 4.4. We prove it for w IA , the other verification 
being completely equivalent). If a € {a, /?}, we define 

=sJes e ^^ds if <r = 



J s-s 
V27r<7 2 J e-s 

(notice that / does not depend on i). We can compute 



iE%(c IA ,a,*) = i£p(D, IA ^a,*) 

i 

= pEf(6 lA ,a) + (l-p)Ef(9 lA ,(3). 
Since f(9, a) is a C 1 function of 9, we immediately obtain that 
\Ef(0 LA ,a)-'Ef(e*,a)\ < CE|^ A - 6*\ 
and, by Corollary 4.3, this last expression converges to 0, for N — > +oo. Hence, 

lEd H (S IA , w*) = P E/(0*, a) + (1 - p)Ef(6* , /?). 
Straightforward computation now proves the thesis. 



2fi 



